Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Histopathology ; 84(6): 915-923, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38433289

RESUMEN

A growing body of research supports stromal tumour-infiltrating lymphocyte (TIL) density in breast cancer to be a robust prognostic and predicive biomarker. The gold standard for stromal TIL density quantitation in breast cancer is pathologist visual assessment using haematoxylin and eosin-stained slides. Artificial intelligence/machine-learning algorithms are in development to automate the stromal TIL scoring process, and must be validated against a reference standard such as pathologist visual assessment. Visual TIL assessment may suffer from significant interobserver variability. To improve interobserver agreement, regulatory science experts at the US Food and Drug Administration partnered with academic pathologists internationally to create a freely available online continuing medical education (CME) course to train pathologists in assessing breast cancer stromal TILs using an interactive format with expert commentary. Here we describe and provide a user guide to this CME course, whose content was designed to improve pathologist accuracy in scoring breast cancer TILs. We also suggest subsequent steps to translate knowledge into clinical practice with proficiency testing.


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Patólogos , Linfocitos Infiltrantes de Tumor , Inteligencia Artificial , Pronóstico
2.
Mod Pathol ; 37(4): 100439, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38286221

RESUMEN

This work puts forth and demonstrates the utility of a reporting framework for collecting and evaluating annotations of medical images used for training and testing artificial intelligence (AI) models in assisting detection and diagnosis. AI has unique reporting requirements, as shown by the AI extensions to the Consolidated Standards of Reporting Trials (CONSORT) and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) checklists and the proposed AI extensions to the Standards for Reporting Diagnostic Accuracy (STARD) and Transparent Reporting of a Multivariable Prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklists. AI for detection and/or diagnostic image analysis requires complete, reproducible, and transparent reporting of the annotations and metadata used in training and testing data sets. In an earlier work by other researchers, an annotation workflow and quality checklist for computational pathology annotations were proposed. In this manuscript, we operationalize this workflow into an evaluable quality checklist that applies to any reader-interpreted medical images, and we demonstrate its use for an annotation effort in digital pathology. We refer to this quality framework as the Collection and Evaluation of Annotations for Reproducible Reporting of Artificial Intelligence (CLEARR-AI).


Asunto(s)
Inteligencia Artificial , Lista de Verificación , Humanos , Pronóstico , Procesamiento de Imagen Asistido por Computador , Proyectos de Investigación
3.
J Pathol ; 261(4): 378-384, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37794720

RESUMEN

Quantifying tumor-infiltrating lymphocytes (TILs) in breast cancer tumors is a challenging task for pathologists. With the advent of whole slide imaging that digitizes glass slides, it is possible to apply computational models to quantify TILs for pathologists. Development of computational models requires significant time, expertise, consensus, and investment. To reduce this burden, we are preparing a dataset for developers to validate their models and a proposal to the Medical Device Development Tool (MDDT) program in the Center for Devices and Radiological Health of the U.S. Food and Drug Administration (FDA). If the FDA qualifies the dataset for its submitted context of use, model developers can use it in a regulatory submission within the qualified context of use without additional documentation. Our dataset aims at reducing the regulatory burden placed on developers of models that estimate the density of TILs and will allow head-to-head comparison of multiple computational models on the same data. In this paper, we discuss the MDDT preparation and submission process, including the feedback we received from our initial interactions with the FDA and propose how a qualified MDDT validation dataset could be a mechanism for open, fair, and consistent measures of computational model performance. Our experiences will help the community understand what the FDA considers relevant and appropriate (from the perspective of the submitter), at the early stages of the MDDT submission process, for validating stromal TIL density estimation models and other potential computational models. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.


Asunto(s)
Linfocitos Infiltrantes de Tumor , Patólogos , Estados Unidos , Humanos , United States Food and Drug Administration , Linfocitos Infiltrantes de Tumor/patología , Reino Unido
4.
J Med Imaging (Bellingham) ; 10(5): 051804, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37361549

RESUMEN

Purpose: To introduce developers to medical device regulatory processes and data considerations in artificial intelligence and machine learning (AI/ML) device submissions and to discuss ongoing AI/ML-related regulatory challenges and activities. Approach: AI/ML technologies are being used in an increasing number of medical imaging devices, and the fast evolution of these technologies presents novel regulatory challenges. We provide AI/ML developers with an introduction to U.S. Food and Drug Administration (FDA) regulatory concepts, processes, and fundamental assessments for a wide range of medical imaging AI/ML device types. Results: The device type for an AI/ML device and appropriate premarket regulatory pathway is based on the level of risk associated with the device and informed by both its technological characteristics and intended use. AI/ML device submissions contain a wide array of information and testing to facilitate the review process with the model description, data, nonclinical testing, and multi-reader multi-case testing being critical aspects of the AI/ML device review process for many AI/ML device submissions. The agency is also involved in AI/ML-related activities that support guidance document development, good machine learning practice development, AI/ML transparency, AI/ML regulatory research, and real-world performance assessment. Conclusion: FDA's AI/ML regulatory and scientific efforts support the joint goals of ensuring patients have access to safe and effective AI/ML devices over the entire device lifecycle and stimulating medical AI/ML innovation.

5.
J Med Imaging (Bellingham) ; 9(4): 047501, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35911208

RESUMEN

Purpose: Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Approach: Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds to a report, image, region of interest (ROI), or biological feature. Pathologists estimated sTILs density in 640 ROIs from hematoxylin and eosin stained slides of 64 patients via two modalities: an optical light microscope and two digital image viewing platforms. Results: The pilot study generated 7373 sTILs density estimates from 29 pathologists. Analysis of annotations found the variability of density estimates per ROI increases with the mean; the root mean square differences were 4.46, 14.25, and 26.25 as the mean density ranged from 0% to 10%, 11% to 40%, and 41% to 100%, respectively. The pilot study informs three areas of improvement for future work: technical workflows, annotation platforms, and agreement analysis methods. Upgrades to the workflows and platforms will improve operability and increase annotation speed and consistency. Conclusions: Exploratory data analysis demonstrates the need to develop new statistical approaches for agreement. The pilot study dataset and analysis methods are publicly available to allow community feedback. The development and results of the validation dataset will be publicly available to serve as an instructive tool that can be replicated by developers and researchers.

6.
Stat Methods Med Res ; 31(11): 2069-2086, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-35790462

RESUMEN

The area under the receiver operating characteristic curve (AUC) is widely used in evaluating diagnostic performance for many clinical tasks. It is still challenging to evaluate the reading performance of distinguishing between positive and negative regions of interest (ROIs) in the nested-data problem, where multiple ROIs are nested within the cases. To address this issue, we identify two kinds of AUC estimators, within-cases AUC and between-cases AUC. We focus on the between-cases AUC estimator, since our main research interest is in patient-level diagnostic performance rather than location-level performance (the ability to separate ROIs with and without disease within each patient). Another reason is that as the case number increases, the number of between-cases paired ROIs is much larger than the number of within-cases ROIs. We provide estimators for the variance of the between-cases AUC and for the covariance when there are two readers. We derive and prove the above estimators' theoretical values based on a simulation model and characterize their behavior using Monte Carlo simulation results. We also provide a real-data example. Moreover, we connect the distribution-based simulation model with the simulation model based on the linear mixed-effect model, which helps better understand the sources of variation in the simulated dataset.


Asunto(s)
Área Bajo la Curva , Humanos , Curva ROC , Método de Montecarlo , Simulación por Computador , Modelos Lineales
7.
JNCI Cancer Spectr ; 6(1)2022 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-35699495

RESUMEN

Medical image interpretation is central to detecting, diagnosing, and staging cancer and many other disorders. At a time when medical imaging is being transformed by digital technologies and artificial intelligence, understanding the basic perceptual and cognitive processes underlying medical image interpretation is vital for increasing diagnosticians' accuracy and performance, improving patient outcomes, and reducing diagnostician burnout. Medical image perception remains substantially understudied. In September 2019, the National Cancer Institute convened a multidisciplinary panel of radiologists and pathologists together with researchers working in medical image perception and adjacent fields of cognition and perception for the "Cognition and Medical Image Perception Think Tank." The Think Tank's key objectives were to identify critical unsolved problems related to visual perception in pathology and radiology from the perspective of diagnosticians, discuss how these clinically relevant questions could be addressed through cognitive and perception research, identify barriers and solutions for transdisciplinary collaborations, define ways to elevate the profile of cognition and perception research within the medical image community, determine the greatest needs to advance medical image perception, and outline future goals and strategies to evaluate progress. The Think Tank emphasized diagnosticians' perspectives as the crucial starting point for medical image perception research, with diagnosticians describing their interpretation process and identifying perceptual and cognitive problems that arise. This article reports the deliberations of the Think Tank participants to address these objectives and highlight opportunities to expand research on medical image perception.


Asunto(s)
Inteligencia Artificial , Radiología , Cognición , Diagnóstico por Imagen , Humanos , Radiología/métodos , Percepción Visual
8.
Cancers (Basel) ; 14(10)2022 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-35626070

RESUMEN

The High Throughput Truthing project aims to develop a dataset for validating artificial intelligence and machine learning models (AI/ML) fit for regulatory purposes. The context of this AI/ML validation dataset is the reporting of stromal tumor-infiltrating lymphocytes (sTILs) density evaluations in hematoxylin and eosin-stained invasive breast cancer biopsy specimens. After completing the pilot study, we found notable variability in the sTILs estimates as well as inconsistencies and gaps in the provided training to pathologists. Using the pilot study data and an expert panel, we created custom training materials to improve pathologist annotation quality for the pivotal study. We categorized regions of interest (ROIs) based on their mean sTILs density and selected ROIs with the highest and lowest sTILs variability. In a series of eight one-hour sessions, the expert panel reviewed each ROI and provided verbal density estimates and comments on features that confounded the sTILs evaluation. We aggregated and shaped the comments to identify pitfalls and instructions to improve our training materials. From these selected ROIs, we created a training set and proficiency test set to improve pathologist training with the goal to improve data collection for the pivotal study. We are not exploring AI/ML performance in this paper. Instead, we are creating materials that will train crowd-sourced pathologists to be the reference standard in a pivotal study to create an AI/ML model validation dataset. The issues discussed here are also important for clinicians to understand about the evaluation of sTILs in clinical practice and can provide insight to developers of AI/ML models.

10.
J Pathol Inform ; 12: 45, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34881099

RESUMEN

PURPOSE: Validating artificial intelligence algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the efforts are focused on developing novel algorithms. In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images. We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer. METHODS: We digitized 64 glass slides of hematoxylin- and eosin-stained invasive ductal carcinoma core biopsies prepared at a single clinical site. A collaborating pathologist selected 10 regions of interest (ROIs) per slide for evaluation. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The microscope platform allows the same ROIs to be evaluated in both modes. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. RESULTS: In total, 19 pathologists made 1645 ROI evaluations during a data collection event and the following 2 weeks. The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. CONCLUSION: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will investigate methods to use the dataset as an external validation tool for algorithms. We will also consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the Food and Drug Administration via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.

11.
J Pathol Inform ; 11: 22, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33042601

RESUMEN

Unlocking the full potential of pathology data by gaining computational access to histological pixel data and metadata (digital pathology) is one of the key promises of computational pathology. Despite scientific progress and several regulatory approvals for primary diagnosis using whole-slide imaging, true clinical adoption at scale is slower than anticipated. In the U.S., advances in digital pathology are often siloed pursuits by individual stakeholders, and to our knowledge, there has not been a systematic approach to advance the field through a regulatory science initiative. The Alliance for Digital Pathology (the Alliance) is a recently established, volunteer, collaborative, regulatory science initiative to standardize digital pathology processes to speed up innovation to patients. The purpose is: (1) to account for the patient perspective by including patient advocacy; (2) to investigate and develop methods and tools for the evaluation of effectiveness, safety, and quality to specify risks and benefits in the precompetitive phase; (3) to help strategize the sequence of clinically meaningful deliverables; (4) to encourage and streamline the development of ground-truth data sets for machine learning model development and validation; and (5) to clarify regulatory pathways by investigating relevant regulatory science questions. The Alliance accepts participation from all stakeholders, and we solicit clinically relevant proposals that will benefit the field at large. The initiative will dissolve once a clinical, interoperable, modularized, integrated solution (from tissue acquisition to diagnostic algorithm) has been implemented. In times of rapidly evolving discoveries, scientific input from subject-matter experts is one essential element to inform regulatory guidance and decision-making. The Alliance aims to establish and promote synergistic regulatory science efforts that will leverage diverse inputs to move digital pathology forward and ultimately improve patient care.

12.
NPJ Breast Cancer ; 6: 17, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32411819

RESUMEN

Stromal tumor-infiltrating lymphocytes (sTILs) are important prognostic and predictive biomarkers in triple-negative (TNBC) and HER2-positive breast cancer. Incorporating sTILs into clinical practice necessitates reproducible assessment. Previously developed standardized scoring guidelines have been widely embraced by the clinical and research communities. We evaluated sources of variability in sTIL assessment by pathologists in three previous sTIL ring studies. We identify common challenges and evaluate impact of discrepancies on outcome estimates in early TNBC using a newly-developed prognostic tool. Discordant sTIL assessment is driven by heterogeneity in lymphocyte distribution. Additional factors include: technical slide-related issues; scoring outside the tumor boundary; tumors with minimal assessable stroma; including lymphocytes associated with other structures; and including other inflammatory cells. Small variations in sTIL assessment modestly alter risk estimation in early TNBC but have the potential to affect treatment selection if cutpoints are employed. Scoring and averaging multiple areas, as well as use of reference images, improve consistency of sTIL evaluation. Moreover, to assist in avoiding the pitfalls identified in this analysis, we developed an educational resource available at www.tilsinbreastcancer.org/pitfalls.

13.
Acad Pathol ; 6: 2374289519859841, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31321298

RESUMEN

Validating digital pathology as substitute for conventional microscopy in diagnosis remains a priority to assure effectiveness. Intermodality concordance studies typically focus on achieving the same diagnosis by digital display of whole slide images and conventional microscopy. Assessment of discrete histological features in whole slide images, such as mitotic figures, has not been thoroughly evaluated in diagnostic practice. To further gauge the interchangeability of conventional microscopy with digital display for primary diagnosis, 12 pathologists examined 113 canine naturally occurring mucosal melanomas exhibiting a wide range of mitotic activity. Design reflected diverse diagnostic settings and investigated independent location, interpretation, and enumeration of mitotic figures. Intermodality agreement was assessed employing conventional microscopy (CM40×), and whole slide image specimens scanned at 20× (WSI20×) and at 40× (WSI40×) objective magnifications. An aggregate 1647 mitotic figure count observations were available from conventional microscopy and whole slide images for comparison. The intraobserver concordance rate of paired observations was 0.785 to 0.801; interobserver rate was 0.784 to 0.794. Correlation coefficients between the 2 digital modes, and as compared to conventional microscopy, were similar and suggest noninferiority among modalities, including whole slide image acquired at lower 20× resolution. As mitotic figure counts serve for prognostic grading of several tumor types, including melanoma, 6 of 8 pathologists retrospectively predicted survival prognosis using whole slide images, compared to 9 of 10 by conventional microscopy, a first evaluation of whole slide image for mitotic figure prognostic grading. This study demonstrated agreement of replicate reads obtained across conventional microscopy and whole slide images. Hence, quantifying mitotic figures served as surrogate histological feature with which to further credential the interchangeability of whole slide images for primary diagnosis.

14.
Diagn Pathol ; 14(1): 65, 2019 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-31238983

RESUMEN

BACKGROUND: The establishment of whole-slide imaging (WSI) as a medical diagnostic device allows that pathologists may evaluate mitotic activity with this new technology. Furthermore, the image digitalization provides an opportunity to develop algorithms for automatic quantifications, ideally leading to improved reproducibility as compared to the naked eye examination by pathologists. In order to implement them effectively, accuracy of mitotic figure detection using WSI should be investigated. In this study, we aimed to measure pathologist performance in detecting mitotic figures (MFs) using multiple platforms (multiple scanners) and compare the results with those obtained using a brightfield microscope. METHODS: Four slides of canine oral melanoma were prepared and digitized using 4 WSI scanners. In these slides, 40 regions of interest (ROIs) were demarcated, and five observers identified the MFs using different viewing modes: microscopy and WSI. We evaluated the inter- and intra-observer agreements between modes with Cohen's Kappa and determined "true" MFs with a consensus panel. We then assessed the accuracy (agreement with truth) using the average of sensitivity and specificity. RESULTS: In the 40 ROIs, 155 candidate MFs were detected by five pathologists; 74 of them were determined to be true MFs. Inter- and intra-observer agreement was mostly "substantial" or greater (Kappa = 0.594-0.939). Accuracy was between 0.632 and 0.843 across all readers and modes. After averaging over readers for each modality, we found that mitosis detection accuracy for 3 of the 4 WSI scanners was significantly less than that of the microscope (p = 0.002, 0.012, and 0.001). CONCLUSIONS: This study is the first to compare WSIs and microscopy in detecting MFs at the level of individual cells. Our results suggest that WSI can be used for mitotic cell detection and offers similar reproducibility to the microscope, with slightly less accuracy.


Asunto(s)
Enfermedades de los Perros/patología , Melanoma/patología , Neoplasias de la Boca/patología , Animales , Enfermedades de los Perros/tratamiento farmacológico , Perros , Interpretación de Imagen Asistida por Computador , Melanoma/diagnóstico , Microscopía , Mitosis , Neoplasias de la Boca/diagnóstico , Variaciones Dependientes del Observador , Patólogos , Reproducibilidad de los Resultados
15.
J Med Imaging (Bellingham) ; 6(1): 015501, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30713851

RESUMEN

We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial. We performed five reader studies that differed in terms of cancer prevalence and the distribution of noncancers. Twenty radiologists participated in each reader study. Using split-plot study designs, we collected recall decisions and multilevel scores from the radiologists for calculating sensitivity, specificity, and AUC. Differences in reader-averaged AUCs slightly favored SFM over FFDM (biggest AUC difference: 0.047, SE = 0.023 , p = 0.047 ), where standard error accounts for reader and case variability. The differences were not significant at a level of 0.01 (0.05/5 reader studies). The differences in sensitivities and specificities were also indeterminate. Prevalence had little effect on AUC (largest difference: 0.02), whereas sensitivity increased and specificity decreased as prevalence increased. We found that AUC is robust to changes in prevalence, while radiologists were more aggressive with recall decisions as prevalence increased.

16.
J Med Imaging (Bellingham) ; 5(3): 031410, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-29795776

RESUMEN

The widely used multireader multicase ROC study design for comparing imaging modalities is the fully crossed (FC) design: every reader reads every case of both modalities. We investigate paired split-plot (PSP) designs that may allow for reduced cost and increased flexibility compared with the FC design. In the PSP design, case images from two modalities are read by the same readers, thereby the readings are paired across modalities. However, within each modality, not every reader reads every case. Instead, both the readers and the cases are partitioned into a fixed number of groups and each group of readers reads its own group of cases-a split-plot design. Using a [Formula: see text]-statistic based variance analysis for AUC (i.e., area under the ROC curve), we show analytically that precision can be gained by the PSP design as compared with the FC design with the same number of readers and readings. Equivalently, we show that the PSP design can achieve the same statistical power as the FC design with a reduced number of readings. The trade-off for the increased precision in the PSP design is the cost of collecting a larger number of truth-verified patient cases than the FC design. This means that one can trade-off between different sources of cost and choose a least burdensome design. We provide a validation study to show the iMRMC software can be reliably used for analyzing data from both FC and PSP designs. Finally, we demonstrate the advantages of the PSP design with a reader study comparing full-field digital mammography with screen-film mammography.

17.
Artículo en Inglés | MEDLINE | ID: mdl-28845078

RESUMEN

The FDA recently completed a study on design methodologies surrounding the Validation of Imaging Premarket Evaluation and Regulation called VIPER. VIPER consisted of five large reader sub-studies to compare the impact of different study populations on reader behavior as seen by sensitivity, specificity, and AUC, the area under the ROC curve (receiver operating characteristic curve). The study investigated different prevalence levels and two kinds of sampling of non-cancer patients: a screening population and a challenge population. The VIPER study compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with heterogeneously dense or extremely dense breasts. All cases and corresponding images were sampled from Digital Mammographic Imaging Screening Trial (DMIST) archives. There were 20 readers (American Board Certified radiologists) for each sub-study, and instead of every reader reading every case (fully-crossed study), readers and cases were split into groups to reduce reader workload and the total number of observations (split-plot study). For data collection, readers first decided whether or not they would recall a patient. Following that decision, they provided an ROC score for how close or far that patient was from the recall decision threshold. Performance results for FFDM show that as prevalence increases to 50%, there is a moderate increase in sensitivity and decrease in specificity, whereas AUC is mainly flat. Regarding precision, the statistical efficiency (ratio of variances) of sensitivity and specificity relative to AUC are 0.66 at best and decrease with prevalence. Analyses comparing modalities and the study populations (screening vs. challenge) are still ongoing.

18.
Artículo en Inglés | MEDLINE | ID: mdl-28794577

RESUMEN

The purpose of this work is to present and evaluate methods based on U-statistics to compare intra- or inter-reader agreement across different imaging modalities. We apply these methods to multi-reader multi-case (MRMC) studies. We measure reader-averaged agreement and estimate its variance accounting for the variability from readers and cases (an MRMC analysis). In our application, pathologists (readers) evaluate patient tissue mounted on glass slides (cases) in two ways. They evaluate the slides on a microscope (reference modality) and they evaluate digital scans of the slides on a computer display (new modality). In the current work, we consider concordance as the agreement measure, but many of the concepts outlined here apply to other agreement measures. Concordance is the probability that two readers rank two cases in the same order. Concordance can be estimated with a U-statistic and thus it has some nice properties: it is unbiased, asymptotically normal, and its variance is given by an explicit formula. Another property of a U-statistic is that it is symmetric in its inputs; it doesn't matter which reader is listed first or which case is listed first, the result is the same. Using this property and a few tricks while building the U-statistic kernel for concordance, we get a mathematically tractable problem and efficient software. Simulations show that our variance and covariance estimates are unbiased.

20.
IEEE Trans Med Imaging ; 34(2): 453-64, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25265629

RESUMEN

Task-based assessments of image quality constitute a rigorous, principled approach to the evaluation of imaging system performance. To conduct such assessments, it has been recognized that mathematical model observers are very useful, particularly for purposes of imaging system development and optimization. One type of model observer that has been widely applied in the medical imaging community is the channelized Hotelling observer (CHO), which is well-suited to known-location discrimination tasks. In the present work, we address the need for reliable confidence interval estimators of CHO performance. Specifically, we show that the bias associated with point estimates of CHO performance can be overcome by using confidence intervals proposed by Reiser for the Mahalanobis distance. In addition, we find that these intervals are well-defined with theoretically-exact coverage probabilities, which is a new result not proved by Reiser. The confidence intervals are tested with Monte Carlo simulation and demonstrated with two examples comparing X-ray CT reconstruction strategies. Moreover, commonly-used training/testing approaches are discussed and compared to the exact confidence intervals. MATLAB software implementing the estimators described in this work is publicly available at http://code.google.com/p/iqmodelo/.


Asunto(s)
Algoritmos , Análisis Discriminante , Procesamiento de Imagen Asistido por Computador/métodos , Simulación por Computador , Humanos , Modelos Biológicos , Método de Montecarlo , Fantasmas de Imagen , Relación Señal-Ruido , Tomografía Computarizada por Rayos X , Torso/diagnóstico por imagen
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...